Goto

Collaborating Authors

 third-party cookie


Differentially Private Synthetic Data Release for Topics API Outputs

Dick, Travis, Epasto, Alessandro, Javanmard, Adel, Karlin, Josh, Medina, Andres Munoz, Mirrokni, Vahab, Vassilvitskii, Sergei, Zhong, Peilin

arXiv.org Artificial Intelligence

The analysis of the privacy properties of Privacy-Preserving Ads APIs is an area of research that has received strong interest from academics, industry, and regulators. Despite this interest, the empirical study of these methods is hindered by the lack of publicly available data. Reliable empirical analysis of the privacy properties of an API, in fact, requires access to a dataset consisting of realistic API outputs; however, privacy concerns prevent the general release of such data to the public. In this work, we develop a novel methodology to construct synthetic API outputs that are simultaneously realistic enough to enable accurate study and provide strong privacy protections. We focus on one Privacy-Preserving Ads APIs: the Topics API, part of Google Chrome's Privacy Sandbox. We developed a methodology to generate a differentially-private dataset that closely matches the re-identification risk properties of the real Topics API data. The use of differential privacy provides strong theoretical bounds on the leakage of private user information from this release. Our methodology is based on first computing a large number of differentially-private statistics describing how output API traces evolve over time. Then, we design a parameterized distribution over sequences of API traces and optimize its parameters so that they closely match the statistics obtained. Finally, we create the synthetic data by drawing from this distribution. Our work is complemented by an open-source release of the anonymized dataset obtained by this methodology. We hope this will enable external researchers to analyze the API in-depth and replicate prior and future work on a realistic large-scale dataset. We believe that this work will contribute to fostering transparency regarding the privacy properties of Privacy-Preserving Ads APIs.


The Morning After: Condé Nast is the latest media company to accuse AI search engine Perplexity of plagiarism

Engadget

Condé Nast, the media giant that owns The New Yorker, Vogue and Wired, has sent a cease-and-desist letter to AI-powered search startup Perplexity, according to The Information. The letter, sent on Monday, demanded Perplexity stop using content from Condé Nast publications in its AI-generated responses and accused the startup of plagiarism. It comes a month after Forbes took similar action. Condé Nast CEO Roger Lynch has warned "many" media companies could face financial ruin in the time it would take for litigation against generative AI companies to conclude. Lynch has called upon Congress to take "immediate action."


Customisable Algorithms: an ad stack supercharger - TechNative

#artificialintelligence

In the face of a challenging macroeconomic climate, the UK digital advertising market remains remarkably strong, expected to reach $35.43bn by the end of this year. With advertisers increasingly relying on digital channels for driving brand awareness and sales, platforms like Connected TV (CTV), digital audio and digital out-of-home (DOOH) are picking up a larger slice of the ad spend pie. In contrast to just a few years ago, this investment would have traditionally been allocated to offline media. This is not to say that the industry is without its problems however. The economic situation, amongst other geopolitical pressures, is having an adverse effect on the sector, forcing media planners to think more short-term and reactively.


Privacy Aware Experiments without Cookies

Shankar, Shiv, Sinha, Ritwik, Mitra, Saayan, Swaminathan, Viswanathan, Mahadevan, Sridhar, Sinha, Moumita

arXiv.org Artificial Intelligence

Consider two brands that want to jointly test alternate web experiences for their customers with an A/B test. Such collaborative tests are today enabled using third-party cookies, where each brand has information on the identity of visitors to another website. With the imminent elimination of third-party cookies, such A/B tests will become untenable. We propose a two-stage experimental design, where the two brands only need to agree on high-level aggregate parameters of the experiment to test the alternate experiences. Our design respects the privacy of customers. We propose an estimator of the Average Treatment Effect (ATE), show that it is unbiased and theoretically compute its variance. Our demonstration describes Figure 1: Proposed experimental design for two brands to how a marketer for a brand can design such an experiment and estimate the average treatment effect (ATE) without third analyze the results. On real and simulated data, we show that the party cookies. The brands only agree on a definition of clusters approach provides valid estimate of the ATE with low variance (1 & 2), the two treatments (orange and blue "buy" and is robust to the proportion of visitors overlapping across the buttons), and agree to randomize in proportions (& (1)) brands.


A beginners guide to cookies

FOX News

Former English teacher, Peter Laffin, says schools should restrict technology in classrooms amid the emergence of Open AI's new artificial intelligence chatbot. Cookies may sound deliciously appealing on the surface. Allowing cookies on your devices and browser have a sweet side and occasional bitter aftertaste if not managed properly. First, the basics of how cookies work with browsers will go a long way to helping know when to accept or reject them. While cookies are designed in the hopes of giving you a more pleasurable browsing or surfing experience, many have feared that accepting cookies means that you are willingly giving away your personal information, and making yourself vulnerable to hackers and malware.


Getting Ahead of the Cookie Curve Using AI Relevance Tools

#artificialintelligence

So much has changed in the past two years about the retail space, including customer expectations and demands from retailers. In an analog world where a consumer will walk into a physical store for their shopping needs, whether they are on a mission for a specific item or going in for inspiration, what they expect is that they'll find what's relevant to them – often with a knowledgeable store associate to guide them. Now, in e-commerce, 90% of consumers expect the online experience to be equal to if not better than the in-store experience. This means finding what they need quickly and consistently each time they go to an online retail destination. Without a physical store associate to provide them with guidance, retailers can utilize technologies and advanced search platforms that allow consumers to easily and efficiently navigate, and discover what they need.


A Quick Introduction to Federated Learning of Cohorts [FloC]

#artificialintelligence

Federated Learning is a relatively new and evolving machine learning technique that decentralizes the training of data from one central machine/ data center to multiple devices, including mobile phones. Federated Learning of Cohorts, or FloC for short, is a form of web tracking enabled through Federated Learning in which individuals are grouped into "cohorts" based on similar browsing behavior. Machine learning is a branch of Artificial Intelligence and computer science that leverages data and algorithms to make computers mimic human learning and decision making. Federated Learning takes advantage of edge computing principles, bringing computation and data storage closer to where it is needed. This principle allows for reduced response times, bandwidth conservation, and personalization, amongst other benefits.


Predictions Series 2022: How to Win in an Opt-In Era

#artificialintelligence

Opt-in doomsayers believe this legislation could cripple the entire advertising industry because consumers will have more meaningful control over their privacy, and authenticated audiences will shrink as a result. However, the trends that have shaped the market can be bucked, and publishers and advertisers have an exciting opportunity to create a new ecosystem in compliance with the opt-in marketplace that benefits everyone involved – including consumers. Further, the browser and device manufacturer changes that are already in-progress are already moving the industry towards a more logged-in environment, in which it becomes easier for consumers to opt in as they authenticate. As we look towards an opt-in era in the future, it's important that publishers and marketers consider how the industry arrived at this point, and the lessons they can take away from this journey. Under the opt-out default, it's easy to see that the consumer experience has been lacking, and a lot of that falls on technology. The opt-out default enabled the propagation of third-party cookies, and the collection of data – often in a way that was not as transparent as it could have been for consumers.


Why data-driven marketing needs artificial intelligence and machine learning

#artificialintelligence

No one needs reminding that the life of the third-party cookie is increasingly finite, with only just over a year left before they are obsolete. At the same time, privacy regulation is tightening and consumers are getting more and more data-savvy, with 90% wanting more data privacy built into their devices. The loss of a lot of data that marketers have, to date, taken for granted – meaning that more decisions will have to be made with less data. Despite alternatives to the third-party cookie progressing hugely over the past two years, the way ahead is still daunting for many marketers. Targeting and measurement are the areas where the lack of third-party data will be felt most strongly.


Machine Learning at the Edge

#artificialintelligence

I'm really excited to talk about advances in federated learning at the edge with you. When I think about the edge, I often think about small embedded devices, IoT, other types of things that might have a small computer in them, and I might not even realize that. I recently learned that these little scooters that are all over my city in Berlin, Germany, and maybe even yours as well, that they are collecting quite a lot of data and sending it. When I think about the data they might be collecting, and when I put on my data science and machine learning hat, and I think about the problems that they might want to solve, they might want to know about maintenance. They might want to know about road and weather conditions. They might want to know about driver performance. Really, the ultimate question they're trying to answer is this last one, which is, is this going to result in some problem for the scooter, or for the human, or for the other things around the scooter and the human? These are the types of questions we ask when we think about data and machine learning. When we think about it on the edge, or with embedded small systems, this often becomes a problem because traditional machine learning needs quite a lot of extra information to answer these questions. Let's take a look at a traditional machine learning system and investigate how it might go about collecting this data and answering this question. First, all the data would have to be aggregated and collected into a data lake. It might need to be standardized, or munged, or cleaned, or something done with it beforehand. Then, eventually, that data is pulled usually by a data science team or by scripts written by data engineering, or data scientists on the team.